id 0
Expert-Agnostic Learning to Defer
Strong, Joshua, Saha, Pramit, Ibrahim, Yasin, Ouyang, Cheng, Noble, Alison
Recent advancements in this field have including the development of consistent surrogate losses for introduced features enabling flexibility to unseen training these systems (Mozannar & Sontag, 2021; Verma experts at test-time, but we find these approaches & Nalisnick, 2022), and extensions that allow for deferral have significant limitations. To address these, we to multiple experts (Verma et al., 2023). Recent work by introduce EA-L2D: Expert-Agnostic Learning to Tailor et al. (2024) proposed a meta-learning solution for Defer, a novel L2D framework that leverages a L2D systems that can adapt to experts not seen during the Bayesian approach to model expert behaviour in training regime through meta-learning representations of an expert-agnostic manner, facilitating optimal expert behaviours, enabling the system to quickly adapt to deferral decisions. EA-L2D offers several critical new experts using a small set of their example predictions, improvements over prior methods, including denoted context predictions. However, this approach exhibits the ability to incorporate prior knowledge about a key weakness in limited generalisation to experts experts, a reduced reliance on expert-annotated with expertise unseen during training. Additionally, their data, and robust performance when deferring to solution poses problems seen more widely in L2D literature, experts with expertise not seen during training.
Bridging Information-Theoretic and Geometric Compression in Language Models
Cheng, Emily, Kervadec, Corentin, Baroni, Marco
For a language model (LM) to faithfully model human language, it must compress vast, potentially infinite information into relatively few dimensions. We propose analyzing compression in (pre-trained) LMs from two points of view: geometric and information-theoretic. We demonstrate that the two views are highly correlated, such that the intrinsic geometric dimension of linguistic data predicts their coding length under the LM. We then show that, in turn, high compression of a linguistic dataset predicts rapid adaptation to that dataset, confirming that being able to compress linguistic information is an important part of successful LM performance. As a practical byproduct of our analysis, we evaluate a battery of intrinsic dimension estimators for the first time on linguistic data, showing that only some encapsulate the relationship between information-theoretic compression, geometric compression, and ease-of-adaptation.
Masked Trajectory Models for Prediction, Representation, and Control
Wu, Philipp, Majumdar, Arjun, Stone, Kevin, Lin, Yixin, Mordatch, Igor, Abbeel, Pieter, Rajeswaran, Aravind
We introduce Masked Trajectory Models (MTM) as a generic abstraction for sequential decision making. MTM takes a trajectory, such as a state-action sequence, and aims to reconstruct the trajectory conditioned on random subsets of the same trajectory. By training with a highly randomized masking pattern, MTM learns versatile networks that can take on different roles or capabilities, by simply choosing appropriate masks at inference time. For example, the same MTM network can be used as a forward dynamics model, inverse dynamics model, or even an offline RL agent. Through extensive experiments in several continuous control tasks, we show that the same MTM network -- i.e. same weights -- can match or outperform specialized networks trained for the aforementioned capabilities. Additionally, we find that state representations learned by MTM can significantly accelerate the learning speed of traditional RL algorithms. Finally, in offline RL benchmarks, we find that MTM is competitive with specialized offline RL algorithms, despite MTM being a generic self-supervised learning method without any explicit RL components. Code is available at https://github.com/facebookresearch/mtm
Adaptive Dropout Rates for Learning with Corrupted Features
Zhuo, Jingwei (Tsinghua University) | Zhu, Jun (Tsinghua University) | Zhang, Bo (Tsinghua University)
Feature noising is an effective mechanism on reducing the risk of overfitting. To avoid an explosive searching space, existing work typically assumes that all features share a single noise level, which is often cross-validated. In this paper, we present a Bayesian feature noising model that flexibly allows for dimension-specific or group-specific noise levels, and we derive a learning algorithm that adaptively updates these noise levels. Our adaptive rule is simple and interpretable, by drawing a direct connection to the fitness of each individual feature or feature group. Empirical results on various datasets demonstrate the effectiveness on avoiding extensive tuning and sometimes improving the performance due to its flexibility.
Extended Property Paths: Writing More SPARQL Queries in a Succinct Way
Fionda, Valeria (University of Calabria) | Pirrò, Giuseppe (University of Koblenz-Landau) | Consens, Mariano P. (University of Torornto)
We introduce Extended Property Paths (EPPs), a significant enhancement of SPARQL property paths. EPPs allow to capture in a succinct way a larger class of navigational queries than property paths. We present the syntax and formal semantics of EPPs and introduce two different evaluation strategies. The first is based on an algorithm implemented in a custom query processor. The second strategy leverages a translation algorithm of EPPs into SPARQL queries that can be executed on existing SPARQL processors. We compare the two evaluation strategies on real data to highlight their pros and cons.